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Abstract 

We analyze the time series of soccer matches in a model-free way using data for the German 
soccer league (Bundesliga). We argue that the goal difference is a better measure for the overall 
fitness of a team than the number of points. It is shown that the time evolution of the table during 
a season can be interpreted as a random walk with an underlying constant drift. Variations of 
the overall fitness mainly occur during the summer break but not during a season. The fitness 
correlation shows a long-time decay on the scale of a quarter century. Some typical soccer myths 
are analyzed in detail. It is shown that losing but no winning streaks exist. For this analysis ideas 
from multidimensional NMR experiments have been borrowed. Furthermore, beyond the general 
home advantage there is no statistically relevant indication of a team-specific home fitness. Based 
on these insights a framework for a statistical characterization of the results of a soccer league is 
introduced and some general consequences for the prediction of soccer results are formulated. 

PACS numbers: 89.20.-a,02.50.-r 
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I. INTRODUCTION 



In recent years physicists have started to investigate time series, resulting from successive 
matches in sports leagues. In this context several basic questions can be asked. Is the 
champion always the best team? l[ [2], [3] How many matches have to be played in a league 
so that (nearly) always the best team becomes the champion? [l, 2] Does the distribution 
of goals follow a Poisson distribution and what are possible interpretations of the observed 
deviations? 0, H]. In those studies it has been attempted to have a simplified view on 
complex processes such as soccer matches in order to extract some basic features like, e.g., 
scaling laws. Some empirical observations such as fat tails in the goal distributions can be 
related to other fields such as finance markets [6] and have been described, e.g., by the Zipf- 
Mandelbrot law (7j. Actually, also in more general context the analysis of sports events, 
e.g. under the aspect of extreme value statistics, has successfully entered the domain of 
physicists activities s|. 

A more specific view has been attempted in detailed studies of the course of a soccer 
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121 ]. one introduces different 



season. In one type of models; see e.g. Refs. 0, Q, 
parameters to characterize a team (e.g. offensive fitness) which can be obtained via Monte- 
Carlo techniques. These parameters are then estimated based on a Poisson assumption 
about the number of goals of both teams. Within these models, which were mainly applied 
to the English Premier league, some temporal weighting factors were included to take into 
account possible time variations of the different team parameters. These models are aimed 
to make predictions for the goals in individual matches. In [l2| it is reported that based 
on a complex fitting procedure the time scale of memory loss with respect to the different 
variables is as short as 100 days. 

A second type of model assumes just one fitness parameter for each team and the out- 
come (home win, draw, away win) is then predicted after comparing the difference of the 
team fitness parameters with some fixed parameters fl3 |. The model parameters are then 
estimated based on the results of the whole season. Here, no temporal evolution of the 

n 

team parameter is involved. This very simple model has been used in [14| to check whether 
the outcome of one match influences the outcome of the successive match. Of course, this 
type of results is only relevant if the used model indeed reflects the key ingredients of the 
real soccer events in a correct way. It has been also attempted to analyse individual soccer 
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matches on a very detailed level, e.g., to estimate the effect of tactical changes [151 ] 

The approach, taken in this work, is somewhat different. Before devising appropriate 
models, which will be done in subsequent work, we first attempt to use a model-free approach 
to learn about some of the underlying statistical features of German soccer (1. Bundesliga). 
However, the methods are general enough so that they can be easily adapted to different 
soccer leagues or even different types of sports. The analysis is exclusively based on the 
knowledge of the final results of the individual matches. Since much of the earlier work in 
this field originates from groups with a statistics or economy background, there is some room 
for the application of complimentary concepts, more common in the physics community. 
Examples are finite-size scaling, the analysis of 2-time correlation functions or the use of 
more complex correlation functions to unravel the properties of subensembles, as used, e.g., 
in previous 4D NMR experiments If], 0, [3] . 

Four key goals are followed in this work. First, we ask about appropriate observables 
to characterize the overall fitness of a team. Second, using this observable we analyze the 
temporal evolution of the fitness on different time scales. Third, we quantify statistical and 
systematic features for the interpretation of a league table and derive some general properties 
of prediction procedures. Forth, we clarify the validity of some soccer myths which are often 
used in the typical soccer language, including serious newspapers, but never have been fully 
checked about their objective validity. Does something like a winning or losing streak exist? 
Do some teams have a specific home fitness during one season? 

The paper is organized as follows. In Sect. II we briefly outline our data basis. The 
discussion of the different possible measures of the overall fitness is found in Sect. III. In the 
next step the temporal evolution of the fitness is analyzed (Sect. IV). In Sect.V it is shown 
how the systematic differences in the team fitness can be separated from the statistical 
effects of soccer matches and how a general statistical characterization can be performed. 
In Sect. VI we present a detailed discussion of some soccer myths. Finally, in Sect. VII we 
end with a discussion and a summary. In two appendices more detailed results about a few 
aspects of our analysis are presented. 
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II. DATA BASIS 



We have taken the results of the German Bundesliga from 



http://www.bundesliga-statistik.de For technical reasons we have excluded the sea- 



sons 1963/64, 1964/65 and 1991/92 because these were the seasons where the league 
contained more or less than 18 teams. Every team plays against any other team twice the 
season, once at home and once away. If not mentioned otherwise we have used the results 
starting from the season 1987/88. The reason is that in earlier years the number of goals 
per season was somewhat larger, resulting in slightly different statistical properties. 



III. USING GOALS OR POINTS TO MEASURE THE TEAM FITNESS? 



A. General problem 

Naturally, a strict characterization of the team fitness is not possible because human 
behavior is involved in a complex manner. A soccer team tries to win as many matches 
as possible during a season. Of course, teams with a better fitness will be more successful 
in this endeavor. As a consequence the number of points P or the goal difference AG can 
be regarded as a measure for the fitness. In what follows all observables are defined as the 
average value per match. 

In Sect. IV it is shown that apart from fluctuations the team fitness remains constant 
during a season. Thus, in a hypothetical season where teams play infinitely often against 
each other and thus statistical effects are averaged out the values of P indeed allow a strict 
sorting of the quality of the teams. Thus, P is a well-defined fitness measure for the team 
fitness during a season. Naturally, the same holds for AG if the final ranking would be 
related to the goal difference. Since in reality the champion is determined from the number 
of points one might tend to favor P to characterize the team fitness. In any event, one would 
expect that the rankings with respect to AG or P are identical in this hypothetical limit. 

Evidently, in a match the number of goals scored or conceded by a team is governed 
by many unforeseen effects. This is one of the reasons why soccer is so popular. As a 
consequence, the empirical values of P or AG obtained, e.g., after a full season will deviate 
from the limiting values due to the residual fluctuations. This suggests a relevant criterion to 
distinguish between different observables. Which observable displays a minimum sensitivity 
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Figure 1: The distribution of AG after one quarter of the season and after a full season. Included 
is a fit with two Gaussian functions for both distributions. For the full-season distribution the 
intensity ratio of both Gaussian curves is approx. 1:6. The correlation coefficient for the latter is 
0.985. 

on statistical effects? As will be shown below, this criterion favors the use of AG. 
B. Distribution of AG 

In FigfTJwe display the distribution of AG after one quarter of a season (thereby averaging 
over all quarters) and at the end of the season. The first case corresponds to N = 9 (first 
and third quarter) or 8 (second and fourth quarter), the second case to N = 34. Here N 
denotes the number of subsequent matches, included in the determination of AG. 

Both distributions can be described as a Gaussian plus an additional wing at large AG. 
Fitting each curve by a sum of two Gaussians, the amplitude ratio for the full-season distri- 
bution implies that there are on average 2-3 teams with an exceptional good fitness. 

Note that the distribution of AG is significantly narrower for larger N and also for 
N = 34 one expects some finite statistical contribution to the width of the distribution. 
Qualitatively, this reflects the statistical nature of individual soccer matches. Naturally, 
the statistical contribution becomes less relevant when averaging over more matches. This 
averaging effect will be quantified in Sect.V. 
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Figure 2: The correlation of AG for the first and the second half of the season. Included are the 
respective averages together with the standard deviation which on average is 0.51. Furthermore 
an overall regression line is included which has a slope of 0.53. 

C. Correlation analysis 

A natural question to ask is whether the distribution for N = 34 can be explained under 
the assumption that all teams have an identical fitness. If this is the case the outcome of 
each match would be purely statistical and no correlation between the goal differences of a 
team in successive matches could be found. To check this possibility in a simple manner we 
correlate the value of AG, obtained in the first half of the season (AGi), with the value of 
the second half of the same team (AG2). The results, collected for all years and all teams 
(per year) are shown in FigfSJ One observes a significant correlation. Thus, not surprisingly, 
there is indeed a variance of the fitness of different teams. 

For a quantification of the correlation one can use the Pearson correlation coefficient 



to correlate two distributions Mi and M 2 . For the present problem it yields 0.55 ±0.03. The 
error bar has been determined by calculating cp(M 1; M 2 ) individually for every year and then 
averaging over all years. This procedure is also applied in most of the subsequent analysis 
and allows a straightforward estimation of the statistical uncertainty. The average value 
(AG2) can be interpreted as the best estimation of the fitness, based on knowledge of AG\. 
Note that the variance of the distribution of AG2 for every AG\ is basically independent of 
AG 1 and is given by 0.51. 

There is a simple but on first view astonishing observation. It turns out that a team 




(1) 



with a positive AG in the first half will on average also acquire a positive AG in the second 
half, but with a smaller average value. This is reflected by the slope of the regression line 
smaller than unity. This observation is a manifestation of the regression toward the mean 
which, however, is not always taken into account [3]. Qualitatively, this effect can 
be rationalized by the observation that a team with a better-than-average value of AG 
very likely has a higher fitness but, at the same time, on average also had some good luck. 
This statistical bias is, of course, not repeated during the second half of the season. For a 
stationary process AG has the same statistical properties in the first and the second half. 
Then the slope of the regression line is identical to the correlation coefficient (here: 0.53 vs. 
0.55). 

In a next step we have taken the observable p(AG = 2) which describes the probability 
that a team wins a match with a goal difference of exactly two. Of course, this is also 
a measure of the fitness of the team but intuitively one would expect a major intrinsic 
statistical variance which should render this observable unsuited to reflect the team fitness 
for the real situation of a finite season. One obtains a correlation coefficient of 0.19. In 
agreement with intuition one indeed sees that observables which are strongly hampered 
by statistical effects display a lower correlation coefficient. Stated differently, the value of 
c p (Mi, M2) can be taken as a criterion how well the observable M reflects the fitness of a 
team. This statement is further corroborated in Appendix I on the basis of a simple model 
calculation. In particular it is shown that this statement holds whether or not the team 
fitness changes during a season. 

We have repeated the analysis for the value of P, applying the present rule (3 points 
for a win, 1 point for a draw and for a loss) to all years. The results, however, are 
basically identical if using the 2-point rule. Here we obtain 0.49 ± 0.03 which is smaller 
than the value obtained for AG. One might argue that both values can still agree within 
statistical errors. However, since the variation from season to season is very similar for both 
correlation factors the difference is indeed significant. A detailed statistical analysis yields 
c P (AGi, AG 2 ) - c P (Px, P 2 ) = 0.06 ± 0.015. 

How to rationalize this difference? A team playing 1:0 gets the same number of points 
than a team winning 6:0. Whereas in the first case this may have been a fortunate win, in the 
second case it is very likely that the winning team has been very superior. As a consequence 
the goal difference may identify very good teams whereas the fitness variation among teams 
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Cp 


AG 


0.55 ±0.035 


P 


0.49 ± 0.035 


p(AG) = 2 


0.19 ±0.06 



Table I: Pearson correlation coefficients for different observables. 

with a given number of points is somewhat larger. Actually, using AGi to predict P2 is 
also more efficient than using Pi (cp(AG 1 ,P 2 ) > cp(P\, P2)). One might wonder whether 
the most informative quantity is a linear combination of AG and P. Indeed the optimized 
observable AG±0.3P displays a larger value of cp than AG alone. The difference, however, 
is so small (Acp ~ 0.001) that the additional information content of the points can be totally 
neglected. 

As a conclusion a final ranking in terms of goals rather than points is preferable if one 
really wants to identify the strongest or weakest teams. 

IV. TEMPORAL EVOLUTION OF THE FITNESS 

Having identified AG as an appropriate measure for the team fitness one may ask to 
which degree the team fitness changes with time. This will be analyzed on three different 
time scales, now using all data starting from 1965/66. 

First we start with variations within a season. One may envisage two extreme scenarios 
for the time evolution of the fitness during a season: First a random walk in fitness-space, 
second fluctuations around fixed values. These scenarios are sketched in Figj3l 

To quantify this effect we divide the season in four nearly equal parts (9 matches, 8 
matches, 9 matches, 8 matches), denoted quarters. The quarters are enumerated by an index 
from 1 to 4. In the random-walk picture one would naturally expect that the correlation 
of quarters 1 and m (m = 2,3,4) is the stronger the smaller the value of m is. For the 
subsequent analysis we introduce the variable n — m — 1, indicating the time lag between 
both quarters. In contrast, in the constant-fitness scenario no dependence on n is expected. 
The correlation factors, denoted c q (n), are displayed in the central part of FigJH To decrease 
the statistical error we have averaged over the forward direction (first quarter with m = 
n + 1-th quarter) and the time-reversed direction (last quarter with m = 4 — ra-th quarter). 
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Figure 3: Two extreme scenarios for the time evolution of the fitness during a season, (a) The 
fitness performs a random-walk dynamics under the only constraint that the fitness distribution 
of all teams is (roughly) stationary, (b) The fitness of each team fluctuates around a predefined 
value which is constant for the whole season. 

Interestingly, no significant dependence on n is observed. The correlation between the first 
and the fourth quarter is even slightly larger than between the first and the second quarter, 
albeit within the error bars. Thus, the hypothesis that the fitness remains constant during 
a season (apart from short-ranged fluctuations) is fully consistent with the data. Of course, 
because of the residual statistical uncertainties of the correlations, one cannot exclude a 
minor systematic variation of the fitness. 

This analysis can be extended to learn about a possible fitness variation when comparing 
one season with the next or the previous season. More specifically, we correlate the fitness 
in the first quarter of a given season with the quarters m = 5, 6, 7, 8 in the next season and 
with the quarters m = —3, —2, —1, and the previous season and plot it again as function 
of n = m — 1. The results are also included in FigJH Interestingly, there is a significant 
drop of correlation which, consistent with the previous results, does not change during the 
course of the next or the previous season. Thus it is by far the summer break rather than 
the time during a season where most changes happen to the fitness of a team. The very 
fact that the correlation to last year's result is weaker than present year's result has been 
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Figure 4: The correlations between quarters, involving the comparison between subsequent seasons. 
n denotes the difference between the quarter indices. For a closer description see text. 

already discussed in [^(J, based on a specific model analysis. 

Finally, we have analysed the loss of correlation between seasons i and % + n. In order to 
include the case n = in this analysis we compared AG, determined for the first and the 
second halves of the season. Thus, for the correlation within the same season one obtains 
one data point, for the correlation of different seasons one obtains four data points which 
are subsequently averaged. c y (n) denotes the corresponding Pearson correlation coefficient, 
averaged over all initial years i. We checked that for n > we get the same shape of 
c y (n) (just with larger values) when full-year correlations are considered. Of course, when 
calculating the correlation coefficient between seasons % and i + n one only takes into account 
teams which are in the Bundesliga in both years. However, even for large time differences, 
i.e. large n, this number is significant (e.g. the number of teams playing in the first season, 
analyzed in this study, and the season 2007/08 is as large as 11). This already indicates 
that, given the large number of soccer teams in Germany which might potentially play in 
the Bundesliga, a significant persistence of the fitness is expected although many of these 
teams in between may have been briefly relegated to a lower league. 

The results are shown in FigfSJ c y (n) displays a fast decorrelation for short times which 
slows down for longer times. To capture these two time-regimes we have fitted the data by a 
bi-exponential function (numbers are given in the figure caption). This choice is motivated 
by the fact that this is maybe the simplest function which may quantify the n-dependence 
of c y (n). The short-time loss has a time scale of around 2 years. This effect, however, only 
has an amplitude of around 2/5 as compared to the total. The remaining loss of correlation 
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Figure 5: The fitness correlation when comparing AG for two seasons which are n years apart. 
The analysis is based on the comparison of half-seasons (see text for more details). The data are 
fitted by c y (n) = 0.22 exp(-n/1.7) + 0.34exp(-n/27). 

occurs on a much longer scale (around 20-30 years). Obviously, there exist fundamental 
properties of a team such as the general economic situation which only change on extremely 
long time scales given the short-range fluctuations of a team composition. As mentioned 
above, this long-time correlation is also reflected by the small number of teams which during 
the last decades have played a significant time in the Bundesliga. 

V. STATISTICAL DESCRIPTION OF A SOCCER LEAGUE 
A. General 

Here we explicitly make use of the observation that the fitness does not change during 
the season. Actually, in this Section we will report another supporting piece of evidence 
for this important fact. Hypothetically, this fitness could be obtained "experimentally" if a 
season would contain an infinite number of matches between the 18 teams. Then, the fitness 
could be identified as the observable AG(N — ► oo) (abbreviated AG(oo)). The specific 
value for team % is denoted AGj(oo). We already know from the discussion of Figj2]that the 
values AGj(oo) are distributed. As a consequence the variance of AG(oo), denoted cr\ G , 
is non-zero. Although it cannot be directly obtained from the soccer table (because of the 
finite length of a season) it can be estimated via appropriate statistical means, as discussed 
below. Because the number of goals and the width of the distribution of AG somewhat 
decreased if comparing the years starting from the season 1987/88 with the earlier years, we 
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restrict the analysis in this section to the latter time regime. 



B. Estimation of the statistical contribution 

Formally, the omnipresence of statistical effects can be written as 

AG, (iV) = AGi(oo) + AG hStat (N). (2) 

In physical terms this corresponds to the case of a biased random walk, i.e. a set of particles, 
each with a distinct velocity (corresponding to (AGj(oo))) and some diffusion contribution 
(corresponding to AGi >stat (N)). We note in passing that to a good approximation the 
amplitude of the statistical contribution does not depend on the value of the fitness, i.e. the 
index i in the last term of Eqf2] can be omitted. Otherwise, the variance in FigfS] would 
depend on the value of AGi. 

Squaring Eqj2] and averaging over all teams one can write 

a AG(N) = a AG + a AG(N),stat (3) 

where the variances of the respective terms have been introduced. c>ag(ao stat ls expected 
to scale like 1/N and will disappear in the limit iV — > oo. Thus, a\ G can be extracted by 
linear extrapolation of 0"ag(ao lIy a l/A^-representation. We have restricted ourselves to even 
values of iV in order to avoid fluctuations for small iV due to the differences between home 
and away matches. To improve the statistics we have not only used the first iV matches of 
a season but used all sets of iV successive matches of a team for the averaging. This just 
reflects the fact that any iV successive matches have the same information content about 
the quality of a team. 

One can clearly see in Fig JH] that one obtains a straight line in the 1/iV-representation for 
all values of N. We obtain 

9 3.03 
°ag W = 0.215 + —, (4) 

i.e. a\ G = 0.215 and cf\q(N) stat = 3.03/iV. Generally speaking, the excellent linear fit in 
the 1/iV-representation shows again that the team fitness remains stable during the season. 
Otherwise one would expect a bending because also the first term in Eqj3] would depend on 
iV; see again Appendix I for a more quantitative discussion of this effect. Of course, for this 
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Figure 6: The variance of the distribution of AG(N), averaged over all years. The straight line is 
a linear fit. 
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Figure 7: Statistical contribution to the overall variance after N matches. Included is the analysis 
for the goal differences as well as for the points. 

statement it was important to include only successive matches of a team for the statistical 
analysis. 

In FigJT] the relative contribution of the statistical effects in terms of the variance, i.e. 
a AG(N) stat/ ( a AG(N) stat + °ag) * s snown as a function of N. The result implies that, e.g., 
after the first match of the season (N = 1) approx. 95% of the overall variance is determined 
by the statistical effect. Not surprisingly, the table after one match may be stimulating for 
the leading team but has basically no relevance for the rest of the season. For iV w 14 
the systematic and the statistical effects are the same. Interestingly, even at the end of the 
season the statistical contribution in terms of its contribution to the total variance is still 
as large as 30%. 
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Repeating the same analysis for the number of points P yields 




even N = 22 matches until the systematic effects start to be dominant. At the end of the 
season the statistical contribution is as large as 36%. This shows again that AG is a better 
measure for the fitness because then the random component in the final ranking is somewhat 
smaller. 

C. Prediction of team fitness: General framework 

The previous analysis has shown that even for iV = 34 there still exists a significant 
random contribution. The next goal is to estimate in a statistically consistent way from 
knowledge of AG(N) (e.g. the final scores at the end of the season) the team fitness. 
Formally, one wants to determine the conditional probability function p(AG(oo)\AG(N)) . 
This can be determined by using the Bayes theorem 



Here p(AG(N)\AG(oo)) is fully determined via Eqf2] and corresponds to a Gaussian with 



fitness. This distribution has been already discussed in FigJTJ To first approximation we 
saw a Gaussian behavior with small but significant deviations. One can show that a strict 
linear correlation between the estimated fitness (or the behavior in the second half of the 
season) and AG(N) is fulfilled for a Gaussian distribution q(AG(oo)). Since to a good 
approximation a linear correlation was indeed observed in FigfSJ for the subsequent analysis 
we neglect any deviations from a Gaussian by choosing g(AG(oo)) oc exp(— AG(oo) 2 /2a\ G ). 
Of course, for a more refined analysis the non-Gaussian nature, displayed in Fig{TJ could be 
taken into account. 

After reordering of the Gaussians in Eqj6] one obtains after a straightforward calculation 



p(AG(oo)\AG(N)) cx p{AG(N)\AG{oa)))q{AG{oa)) 



(6) 



variance a 
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. The function q(AG(oo)) describes the a priori probability for the team 



p(AG(oo)\AG(N)) oc exp[-(AG(oo) - a N AG{N)f/2al N ). 
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a e,N ~ T— — 2 J-3- ■ [ J > 

1 ^ AG(N),statf AG 

As discussed in the context of Figj2] is identical to the Pearson correlation coefficient 
when correlating two subsequent values of AG, each based on N matches. 

From EqHJone obtains a N= i 7 = 0.55 and a 2 N=17 = 0.097. As expected is identical to 
cp(AG\, AG2) and within statistical uncertainties identical to the slope of 0.53 in FigJ21 

Finally, we apply these results to the interpretation of the Bundesliga table at the end of 
the season, i.e. for N = 34. Using EqJ7]the estimator for AG(oo) can be written as 

AG(oo) = a N=u AG(N = 34) ± a e , N=u . (10) 

For the present data this can be explicitly written as 

AG(oo) = 0.71[AG(iV = 34) ±0.36]. (11) 

Using standard statistical analysis one can, e.g., determine the probability that a team with 
a better goal difference AG (i.e. AGi > AG2) is indeed the better team. For the present 
data it turns out that for AGi — AG2 = 0.36 (corresponding to an absolute value of 12 
goals after 34 matches) the probability is approx. 24% that the team with the worse goal 
difference is nevertheless the better team. 

In analogy, one can estimate from Eqj5] that two teams which after the season are 10 
points apart have an incorrect order in the league table, based on their true fitness, with a 
probability of 24%. Maybe this figure more dramatically reflects the strong random compo- 
nent in soccer. 



D. Prediction of team fitness: Application 

These results can be taken to quantify the uncertainty when predicting AGj-(M) of team 
i. More specifically, we assume that this prediction is based on the knowledge of the results 
of the N previous matches of team i. The variance of the estimate of AG(M) is denoted 
a 2 st (M, N). This notation reflects the fact that it depends on both the prediction time scale 
M as well as the information time scale N. To estimate AGi(M), based on AG, (A), two 
uncertainties have to be taken into account. First, the uncertainty of estimating AGj(oo) is 
characterized by o 2 e N . Second, even if AG(oo) were known exactly, the statistical uncertainty 
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Figure 8: The function 17 a es t(M = 17, N), describing the uncertainty for the prediction of the goal 
difference during the second half of the season based on the knowledge of N matches. Included is 
the data point, observed numerically in Figj2j 

of estimating AG(M) due to the finite M is still governed by the variance &\ G ( M ^ stat - Thus, 
one obtains 

N) = a\ N + <r 2 AG{M)>stat (12) 

For the specific choice M = 17, i.e. for the prediction of the second half of the season, the 
standard deviation 17 • a est (M = 17, N) of the estimator (expressed in absolute number of 
goals) is displayed in FigJHl First, we discuss the extreme cases. In the practically impossible 
case that the fitness is exactly known (formally corresponding to N — > oo) one obtains a 
standard deviation of approx. 7 goals. In the other extreme limit where no information is 
available, i.e. N = 0) one obtains a value of approx. 10.5 goals. Thus the difference between 
complete information and no information for the prediction of the second half of the season 
is only 3.5 goals. Finally, for the interpretation of the results in Fig[2] one has to choose 
N = 17. As shown in FigJH]the observed standard deviation of 17 • 0.51 ~ 8.7 agrees well 
with the theoretical value based on Eqfl2l The remaining deviations (8.7 vs. 8.9) might 
reflect the non-Gaussian contributions to q(AG(oo)). 

From Eqj5] one can estimate in analogy to above that, based on the knowledge of the 
points for the first half, the number of points for the second half can be estimated with a 
standard deviation of approx. 6 points. Of course, according to our previous discussion the 
estimation would be slightly better if the value of AG rather than the number of points of 
the first half were taken as input. 
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(G±) 






*L 




c + - 


Bundesliga 


1.43 


0.075 


1.45 


0.055 


1.50 


0.71 


Premier League 


1.29 


0.075 


1.40 


0.060 


1.40 


0.85 



Table II: Statistical parameters, characterizing the Bundesliga (1995/96-2007/08) and the English 
Premier League (1996/97-2006/07). 

E. Going beyond the team fitness AG 



So far we have characterized the fitness of a team AG. From a conceptual point of view 
the most elementary quantities are the number of goals G+, scored by a team, as well as the 
number of goals G_ conceded by this team ( AG = G + — G_). Correspondingly, (G±) denotes 
the average number of goals per team and match. The brackets denote the corresponding 
average. Since the subsequent analysis can be also used for prediction purposes we restrict 
ourselves to all years since the season 1995/96 when the 3-point rule had been introduced. 

The above analysis, performed for AG, can be repeated for G±. The general notation 
reads (M G {G+,G_}) 

(13) 



2 

a M(N) 



2 , U M 



The fitting parameters are listed in Tab. II. We note in passing that all statistical features, 
described in this Section, are observed in the English Premier League, too. For reasons of 
comparison the resulting parameters are also included in Tab. II. 

For a complete understanding of the goal statistics one has to include possible correlations 
between G + and G_, i.e. 

((G + -(G))((G)-G_) 



c + ,-(N) 



(14) 



This value reflects the correlation of a team's strength of attack and defence. Complete 
correlation means c +i _(iV) = 1. The statistical effects during a soccer match, related to G + 
and G_, are likely to be statistically uncorrelated. As a consequence one would not expect a 
significant iV-dependence. Indeed, we have verified this expectation by explicit calculation 
of c +i _(iV) which within statistical uncertainty is iV-independent. We obtain c +) _ = 0.71. 

This information is sufficient to calculate cr 2 M , N ^ for M G {AG = G + — G_, EG = G + + 
G_} vmaf G+±G _ KN) = v G+{N) +a G _ {N) T2c + ^a G+ a G _. One obtains cr 2 AG {N) = 0.22+2.95/iV 
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and cr| G (AO = 0.03 + 2.95/iV. u^ G (N) agrees very well with the data, reported above for 
the time interval 1987/88-2007/08. 

Based on this detailed insight into the statistical nature of goals several basic questions 
about the nature of soccer can be answered. 

Are offence or defence abilities more important? The magnitude of the variance a 2 M is 
a direct measure for the relevance of the observable M. Since o~ G+ /a G _ = 1.25 ± 0.09 > 1 
the investment in good strikers may be slightly more rewarding. However, the difference 
is quite small so that to first approximation both aspects of a soccer match are of similar 
importance. 

Do teams with good strikers also have a good defence? In case of a strict correlation 
one would have c + _ = 1. The present value of 0.71 indicates that there is indeed a strong 
correlation. However, the residual deviation from unity reflects some team dependent dif- 
ferences beyond simple statistical fluctuations. Interestingly, this correlation is significantly 
stronger in the Premier League, indicating an even stronger balance between the offence and 
the defence in a team of the Premier League. 

Is the total number of goals of a team (i.e. G+ + G_) a team-specific property? On 
average this sum is 97. Without statistical effects due to the finite length of a season the 
standard deviation of this value would be just 34cteg ~ 6, i.e. only a few percent. Thus, to 
a very good approximation the number of goals on average scored by team % is just given by 
G +ti = (G±) + AGj/2 (an analogous formula holds for G-,i). 

VI. SOCCER MYTHS 

In typical soccer reports one can read that a team is particularly strong at home (or 
away) or is just playing a winning streak (Law/ in German) or a losing streak. Here we show 
that the actual data does not support the use of these terminologies (except for the presence 
of losing streaks) . 

A. Home fitness 

One may ask the general question whether the overall fitness AG of the team fully deter- 
mines the home fitness, i.e. the team quality of playing at home. If yes, it would be useless 
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Figure 9: The variance of A(AG), i.e. a\ 



A(AG(JV)) 



vs. 1/N. The straight line is a linear fit. The 



extrapolation to N = oo yields approx. —0.003 ± 0.016. 

and misleading to define a team-specific home fitness because it is not an independent ob- 
servable but just follows from the overall fitness AG(oo). For the present analysis we use 
again our standard data set starting from 1987/88. 

To discuss the ability of a team to play at home as compared to play away we introduce 
AGh(N) and AG^(iV) as the goal difference in N home matches and iV away matches, 
respectively. Of course, one has AGh(N) + AGa(N) = AG(2N). The home advantage can 
be characterized by 



The average value (A(AG)) is approx. 1.4, which denotes the improved home goal difference 
as compared to the away goal difference. This number also means that on average a team 
scores 0.7 more goals at home rather than away whereas 0.7 goals more are conceded by 
this team when playing away. We note in passing that the home advantage is continuously 
decreasing with time. Just taking the seasons since 1995/96 one gets, e.g., A (AG) ~ 1.0. 

A team-specific home fitness could be characterized by A(AG)i — (A(AG)). A positive 
value means that team i is better at home than expected from the overall fitness AG. Of 
course, again one has to consider the limit N — > oo. Thus, in analogy to the previous Section 
one has to perform a scaling analysis. After N matches A(AG)(N) will be distributed with 
a variance, denoted c"a(ag)(7V) • A positive value of the large iV-limit o~ A , AG ^ reflects the 
presence of a home fitness. Otherwise the quality of a team for a match at home (or away) 
is fully governed by the overall fitness AG(oo). 

The iV-dependence of &A(AG)(m * s snown m Figj9j To obtain these data one has to 



A (AG) = AG H - AG A . 



(15) 
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evaluate the appropriate expression for the empirical variance for this type of analysis which 
is a slightly tedious but straightforward statistical problem. The statistical error has been 
estimated from performing this analysis for the individual years. 

It becomes clear that the hypothesis c"a(ag)(oo) = is fully compatible with the data. 
Because of the intrinsic statistical error one cannot exclude a finite value of ca(AG)(oo) 
(o"a(ag)(oo) < 0.12). This value is less than 10% of the average value (A(AG)) = 1.4. 
Thus, the presence of teams which are specifically strong at home relative to their overall 
fitness is, if at all, a very minor effect. 

Although this result rules out the presence of a relevant team-specific home fitness it 
may be illuminating to approach the same problem from a direct analysis of the whole 
distribution of A(AG)(N = 17). The goal is to compare it with the distribution one would 
expect for the ideal case where no team-specific home fitness is present. This comparison, 
which is technically a little bit involved, is shifted to Appendix II. It turns out that the 
residual home fitness can be described by a value of < (?a(ag) 0.4. This means that in 
particular the simple model, sketched above, is not compatible with the data. In summary, 
relative to the average home advantage of 1.4 any possible residual home fitness is a negligible 
effect. 

In literature it is often assumed that for a specific match of team A vs. team B one 
can a priori define the expectation value of goals tAth) an d £b(o)i_ scored by the home team 



A and the away team B, respectively. In the approach of Ref. 12| one explicitly assumes 
tA{h) = f ab ■ Ch and £e(a) = }ba 4 c a (using a different notation). Here contains the 
information about the offence strength of team i and the defence strength of team j. The 
information about the location of the match is only incorporated into the factors Ch and c a . 
This approach has two implicit assumptions. First, the fact that Ch is team-independent is 
equivalent to the assumption that there is no team-specific home fitness. This is exactly 
what has been shown in this Section. Second, the average number of goals of, e.g., the home 
team is proportional to the average number, expected in a neutral stadium. For reasons of 
convenience this number can be chosen identical to /ab- Then, Ch > 1 takes into account the 
general home advantage. The same holds for c a < 1. Assuming the multiplicative approach 
one has to choose 

- <G±> UT G)> (i6) 

which for the present case yields Ch/c a ~ 1-45 
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In principle, one might have also added some fixed value to take into account the home 
advantage. Thus, the multiplicative approach is not unique. However, using the above con- 
cepts, one can show that this approach is indeed compatible with the data. For this purpose 
we introduce the observables M e {G + h, C + a , G-^, G±^ denotes the number of 

goals scored and conceded by the home team. An analogous definition holds for G± >a . In 
analogy to above one can calculate a 2 M obtained again from the N — > oo-extrapolation of 
the respective observable. One obtains a G+ h = 0.089, a G+ a = 0.044, a G _ h = 0.033, a G _ a = 
0.069. If the properties of home and away goals are fully characterized by the factors c h ^ a 
one would expect a G+ h /°'G + , a = CG_, a /cG_ h = Ch/c a . The two ratios read 1.4 and 1.45, 
respectively, and are thus fully compatible with the theoretically expected value of 1.45. In 
case of an additive constant to account for the home advantage one would have expected 
a ratio of 1 because then the distributions would have been just shifted to account for the 
home advantage. 

In practical terms this allows one to correct the results of soccer matches for the home 
advantage by dividing the number of goals in a match by Ch and c a , respectively. This 
correction procedure may be of interest in cases where one wants to identify statistical 
properties without being hampered by the residual home advantage. Using this procedure 
for the data points in Fig.6 the data points for odd N would also fall on the regression line. 
We just mention in passing that in the limit of small (A(AG)}/ (G±) and small a G± /(G±) 
(which in practice is well fulfilled) this scaling yields similar results as compared to a simple 
downward shifting of the home goals and upward shifting of the away goals by (A(AG ? ))/2. 

B. Streaks 

The aspect of identifying winning or losing streaks is somewhat subtle because one has 
to take care that no trivial selection effects enter this analysis. Here is one example of such 
an effect. Evidently, in case of a winning streak it is likely that during this period the 
team played against somewhat weaker teams and will, subsequently, on average play against 
somewhat stronger teams. Thus, to judge the future behavior of this team one needs a 
method which takes these effects in a most simple way into account. To obtain a sufficiently 
good statistics here we use our complete data set, starting from the season 1965/66. 

The key question to be answered here is whether or not the presence of a winning or 
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Figure 10: Sketch of the definitions of n and m for the analysis of the possible existence of winning 
and losing streaks. 




Figure 11: The probability p w in{n,m). to win after a team as won or lost n times. 



losing sequence stabilizes or destabilizes a team or maybe has no effect at all. If a winning 
sequence stabilizes a team one may speak of a winning streak. Analogously, if a losing 
sequence destabilizes a team one has a losing streak. In general, we have identified all 
sequences of n successive matches where n wins or losses were present. Of course, the actual 
length of the win or loss sequences can have been much longer. Having identified such a 
sequence we have determined the probability that in the m-th match after this sequence 
that team will win. This probability is denoted p W i n (m,n). This is sketched in FigJTUI for 
the case n = 4. 

In a first step we analyze the winning probability in the next match, i.e. for m — 1. The 
data are shown in FigfTTJ In case of winning sequence the probability to win increases with 
increasing n. The opposite holds for a losing sequence. Does this indicate that the longer 
the winning (losing) sequence, the stronger the (de) stabilization effect, i.e. real winning or 
losing streaks emerge? 

This question has been already discussed in Ref. Jjj]. It was correctly argued that by 
choosing teams which have, e.g., won 4 times one typically selects a team with a high fitness. 
This team will, of course, win with a higher probability than an average team (selected for 
n = 0). Thus the increase of the win probability with n is expected even if no stabilizing 
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effect is present. It would be just a consequence of the presence of the fitness distribution 
and thus of good and bad teams, as shown above. Only if all teams had the same fitness 
the data of Fig{TT] would directly indicate the presence of a stabilization and destabilization 
effect, respectively. 

The key problem in this analysis is that the different data points in Fig{TT] belong to 
different subensembles of teams and thus cannot be compared. Therefore one needs to 
devise an analysis tool, where a fixed subensemble is taken. The realization of this tool is 
inspired by 4D NMR experiments, performed in the 90s in different groups to unravel the 
properties of supercooled liquids [16, H, Q. The key problem was to monitor the time 
evolution of the properties of a specific subensemble until it behaves again like the average. 
This problem is analogous to that of a soccer team being selected because of n wins or losses 
in a row. 

This idea can be directly applied to the present problem by analyzing the m-dependence 
of p W i n (m,n). It directly reflects possible stabilization or destabilization effects. In case of 
a stabilization effect p W i n {m) would be largest for m — 1 and then decay to some limiting 
value which would be related to the typical fitness of that team after possible effects of 
the series have disappeared. In contrast, in case of a destabilization effect p W i n (m = 1) 
would be smaller than the limiting value reached for large m. Note that in this way the 
problem of different subensembles is avoided. Furthermore this analysis is not hampered by 
the fact that most likely the opponents during the selection period of n matches were on 
average somewhat weaker teams. The limiting value has been determined independently by 
averaging p W i n {m, n) for |m| > 8, i.e. over matches far away from the original sequence. To 
improve the statistical quality this average also includes the matches sufficiently far before 
the selected sequence (formally corresponding to negative m). Of course, only matches 
within the same season were taken into account. It is supposed to reflect the general fitness 
of a team during this season (now in terms of wins) independent of that sequence. In case of 
no stabilization or destabilization effect the observable p W i n (m, n) would not depend on m. 
This would be the result if playing soccer would be just coin tossing without memory. To 
avoid any bias with respect to home or away matches we only considered those sequences 
where half of the matches were home matches and and the other half away matches (n even). 
Furthermore, the data for p W i n (m,n) are averaged pairwise for subsequent m (1 and 2, 3 
and 4, and so on). 
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Figure 12: The probability to win p w i n { r m-,n = 2) after a sequence of n = 2 wins and losses, 
respectively. The broken lines indicate the range (±l<T-interval) of the plateau value reached for 
large m. 



b-b loss sequence (all) 
O-o win sequence (all) 
O O win sequence (away) 




Figure 13: Same as in the previous figure for n = 4. In addition we have included data where only 
away matches of the teams are considered for the calculation of p w i n (m,n = 4) in case of a win 
sequence. 

The functions p W i n (m, n) for n = 2 and n = 4 are shown in Figs. [T2]and[T3J respectively. 
For n = 4 a total of 374 win sequences and 384 loss sequences have been taken into account. 
For n = 2 one observes a small but significant destabilization after a loss sequence. It takes 
approx. 8 matches to recover. No effects are seen for the win sequence. More significant 
effects are visible for n = 4. For the loss sequence one observes that directly after the 
selected sequences, i.e. for m = 1 and m = 2 the winning probability is reduced by approx. 
30% as compared to the limiting value. Thus for about 6 matches the teams play worse 
than normal. Surprisingly, a reduction of p W j n (m, n = 4) for small m is also visible for the 
win sequence. Thus, there seems to be a destabilization rather than a stabilization effect. 
By restricting the analysis to the away matches after the selected sequence, this effect is 
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Figure 14: Analysis of loss and win sequences, using shuffled data. 

even more pronounced. Of course, correspondingly the effect is smaller for home matches. 
Unfortunately, n = 6 can no longer be analyzed because due to the small number of events 
the statistics is too bad. 

Of course, a critical aspect in this discussion is the matter of statistical significance. For 
this purpose we have estimated the probability that, using Gaussian statistics, the average 
of the first four matches after a win sequence can be understood as an extreme statistical 
deviation from the final plateau value. This probability turns out to be smaller than 10~ 3 . 
Furthermore we analyzed shuffled data, i.e. where for a given team in a given season the 
34 matches are randomly ordered. The results for p win (m,n = 4), using one example of 
ordering, are shown in FigJT4l As expected no effect is seen. The observation that the 
plateau values are somewhat lower than in Fig{13]just reflects the fact the the first data 
points (small m) in Fig{13] are systematically lower than the respective plateau value. 

Thus, we conclude that both a positive (n = 4) as well as a negative sequence (n = 2, 4) 
have a destabilizing effect. This means that losing streaks indeed exist whereas there are 
no stabilization effects for positive sequences, invalidating the notion of a winning streak. 
Rather destabilization effects occur after a longer winning sequence. This asymmetry be- 
tween positive and negative sequences is already reflected by the asymmetry, seen in FigfTTl 



Actually, the present results disagree with the statistical analysis in Ref. [21] for the 
Premier League. In that work it is concluded that sequences of consecutive results tend 
to end sooner than they should without statistical association. However, the presence of 
losing streaks has been clearly demonstrated above. The disagreement might be due to 
the different data set (Bundesliga vs. Premier League). However, one needs to take into 
account that in that work the results have been obtained within a framework of a specific 
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model via Monte-Carlo simulations. The present analysis has the advantage that, first, it 
does not to refer to any model about the nature of soccer and, second, can be done without 
additional Monte-Carlo simulations. Thus, possible artifacts of the model might hamper the 
interpretation of the data. 

VII. DISCUSSION AND SUMMARY 

On a conceptual level we have used finite-size scaling methods to extract the underlying 
distribution of fitness parameters. It turns out that the goal difference is a better measure of 
the team fitness than the number of points. From a technical point of view a key aspect was 
to analyze the iV-dependence of observables such as AG. This problem is analogous to the 
simple physical problem of random walks with a drift. The key results can be summarized 
as follows. 

1. ) The fitness of a team displays a complex temporal evolution. Within a season there 
are no indications for any variations (except maybe for day-to-day fluctuations around some 
average team fitness which can only be identified via a single-match analysis. This is, 
however, beyond the scope of the present work). During the summer-break a significant 
decorrelation is observed. This short-scale decorrelation stops after around 2 years where 
approx. 40% of the fitness has been changed (some teams becoming better, some worse). 
Interestingly, the remaining 60% of the fitness only decorrelates on an extremely long times 
scale of 20-30 years which is close to the data window of our analysis. This shows that there 
are dramatic persistence effects, i.e. there are some underlying reasons why good teams 
remain good on time scales largely exceeding the lifetime of typical structures in a club 
(manager, coach, players etc.). 

2. ) For finite seasons (which, naturally, is realized in the actual soccer leagues) the 
fitness of a team can be only roughly estimated because of the presence of residual statistical 
fluctuations. However, by linear extrapolation of the variance of the team fitness one can 
identify the underlying variance one would (hypothetically) obtain for an infinite number 
of matches. Based on this one can estimate the statistical contribution to the end-of-the- 
season table which is quite significant (36% for points). This allows one to quantify, e.g., 
the relevance of the final league table in some detail. 

3. ) The overall fitness, defined via the goal difference AG, is to a large extent the only 
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characteristics of a team. In particular there is no signature of the presence of a team 
specific home fitness. We would like to stress that the definition of a home fitness is always 
relative to a single season. This means if a team is strong at home in one year and weak 
in another year this would nevertheless show up in the present analysis. Whenever a team 
plays better or worse at home than expected (measured via AGh - AG a) this effect can be 
fully explained in terms of the natural statistical fluctuations, inherent in soccer matches. 

4. ) A more detailed view on the number of goals reveals that the quality of the offence 
and that of the defence of a team is strongly correlated. In case of a perfect correlation their 
quality would be fully determined by the overall fitness. However, since the correlation is not 
perfect there exist indeed differences. Furthermore, the strength of attack is slightly more 
important for a successful soccer team than the strength of defence, albeit the difference is 
not big. 

5. ) It is possible to identify the impact of the home- advantage for the final result. Stated 
differently, one can estimate the average outcome of a match one would obtain at a neutral 
stadium. This procedure may be helpful if data are taken as input for a statistical analysis. 

6. ) The notion of streaks, as present in the soccer language, can only be confirmed in 
case of a losing streak. This means that if a team has lost several times (we analyzed 2 and 
4 times) there is a significant drop of their fitness as compared to the normal level which will 
be reached again sufficiently far away from the time period. Possible reasons may be related 
to psychological aspects as well as the presence of persistent structural problems (such as 
heavily injured players). Surprisingly, no winning streak could be identified. Winning two 
times had no effect on the future outcome. Winning four times even reduced the fitness, in 
particular when having an away match. This analysis had to be performed with care in order 
to avoid any trivial statistical effects. Possibly, this indicates an interesting psychological 
effect. In literature one can find models for understanding the basis of human motivation. 
In one of the standard models by Atkinson a reduction of motivation may occur if the next 
problem appears either to be too difficult (after having lost several times) or too simple 
(after having won several times) 22|. However, since these types of sequences (for n = 4) of 
wins or losses are relatively rare they are of very minor relevance for the overall statistical 
description of the temporal evolution of soccer matches. Since furthermore the effect of 
sequences decays after a few more matches (up to 8) these observations are consistent with 
the notion that the fitness does not change during a season (if averaged over the time scale 
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of at least a quarter season) . 

Of course, a further improvement of the statistical analysis could be reached if further 
explanatory variables are implemented such as the possession of the ball 23|. It would be 
interesting to quantify the increase of the predictive power in analogy to the analysis of this 
work; see, e.g., Tab. I. 

Whereas some of our results were expected, we had to revise some of our own intuitive 
views on how professional soccer works. Using objective statistical methods and appropriate 
concepts, mostly taken from typical physics applications, a view beyond the common knowl- 
edge became possible. Probably, for a typical soccer fan also this statistical analysis will 
not change the belief that, e.g., his/her support will give the team the necessary impetus 
to the next goal and finally to a specific home fitness. Thus, there may exist a natural, 
maybe even fortunate, tendency to ignore some objective facts about professional soccer. 
We hope, however, that the present analysis may be of relevance to those who like to see 
the systematic patterns behind a sports like soccer. Naturally, all concepts discussed in this 
work can be extended to different types of sports. Furthermore an extension to single-match 
properties as well as a correlation with economic factors is planned for the future. 

We would like to thank S.F. Hopp, C. Miiller and W. Krawtschunowski for the help in 
the initial phase of this project as well as B. Strauss, M. Tolan, M. Trede and G. Schewe 
for interesting and helpful discussions. Furthermore we would like to thank H. Heuer for 
bringing the work of Atkinson to our attention. 



VIII. APPENDIX I 



Here we consider a simple model which further rationalizes the statement that observables 
with larger Pearson correlation coefficients (correlation between first and second half of 
season) are better measures for the fitness of a team. This holds independent of whether or 
not the true fitness changes during a season or remains constant. We assume that the true 
fitness of a team i at time j (j may either reflect a single match or, e.g., the average fitness 
during the j-th half of the season) can be captured by a single number /Xjj. Evidently, the 
true fitness /j^j of team i is not exactly known. The variance of the fitness is assumed to 
be time independent, which just reflects stationarity. 

In the experiment (here: soccer match) one observes the outcome Xij which may, e.g., 
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correspond to the goal difference or the number of points of team % at time j. We assume a 
Markovian process, i.e. the outcome at time j is not influenced by the outcome in previous 
matches. Naturally Xij is positively correlated with /ijj. Without loss of generality we 
assume that the (fiij)i = (xij)i = 0. The index i reflects the fact that the averaging is over 
all teams. For reasons of simplicity we assume a linear relation between Xij and fcj, namely 

x i:j = a(fiij + £). (17) 

Here a > is a fixed real number and £ some noise, characterized by its variance cr|. The 
noise reflects the fact that the outcome of a soccer match is not fully determined by the 
fitness of the teams but also includes random elements. This relation expresses the fact that 
a team with a better fitness will on average also perform better during its matches. 

The key idea in the present context is to use the outcome of matches to estimate the 
team fitness. The degree of correlation between x^j and fx^j is captured by the correlation 
coefficient 

A large value of c x . j(U . implies that the estimation of fiij, based on knowledge of Xij works 
quite well. Thus, one may want to search for observables Xij with large values of c XjjAtj . 
Unfortunately, since /i^j cannot be measured not directly accessible from the exper- 

iment. The theoretical expectation reads (see Eqs. fT71 and fT8l) 

C *hH = I ^ ■ ( 19 ) 

For a closer relation to the general experimental situation one has to take into account 
that the team fitness may somewhat change with time. This can be generally captured by 
the correlation factor 

C H>H+i ~ 2 ' V ZU J 

H 

Experimentally accessible is the correlation of Xij for two subsequent time points j and 
j + 1 . A short and straightforward calculation yields (using Eq JT9i) 

C-Xj,Xj+i — c Hj,fij+i[ c Xj,Hj\ ■ (21) 

This result shows that independent of the possible decorrelation of the true fitness \i observ- 
ables x with a larger correlation coefficient c XjyXj+1 display larger c x , ifl ., i.e. form a better 
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measure for the true fitness fi. This is the line of reasoning used to identify AG as a better 
fitness measure than the number of points independent of whether or not AG changes during 
a season. 

To go beyond this key statement we specify the loss of correlation of the true fitness via 
the simple linear ansatz 

fi iJ+ i = b/iij + e. (22) 

Here the noise term is characterized by the variance of. For reasons of simplicity we assume 
that the random-walk type dynamics is identical for all teams. Stationarity is guaranteed 
exactly if 

v 2 e =*l(l-b 2 ). (23) 

Constant fitness naturally corresponds to b = 1 and o e = 0. Of particular interest for the 
present work is the average of over N times (e.g. N matches if j counts the matches). 
Here we define 

X t (N) = ^fp±- (24) 

The variance of this average, denoted <Jx(n) can be calculated in a straightforward manner. 
The result reads 



_ a 



2^2 r A T i Aft , iJVl n 2 rr 2 



a "07 

IT- (25) 



(1-6)2 

For 6=1 one obtains cr^-fjv) = 0,2 a fi + a 2 & 2 /N. Thus, in case of constant team fitness one 
gets a linear behavior in the 1/N representation and the limit value just corresponds to 
the variance of the team fitness (apart from the trivial constant a). This implies that by 
extrapolation one can get important information about the underlying statistics, as described 
by the true team fitness fiij. This just reflects the fact that for sufficient averaging the noise 
effects become irrelevant. For b < 1, however, one has a crossover from that behavior to 
a x(N) = a2<72 [(^ + ~ b)\/N + a 2 a^/N for large N, thus approaching zero for large N. 
Since cr\ G (N) did not show any bending we have concluded in the main text that the data 
do not indicate a decorrelation of the fitness within a single season. 



IX. APPENDIX II 



Here we discuss in more detail the distribution of A(AG)(iV = 17) shown in FigJT5l Of 
course, it has a finite width due to statistical effects. Our goal is to compare this distribution 
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Figure 15: Analysis of the home fitness. The squares correspond to the actual distribution of 
A (AG). This curve is compared with the estimation for <7 A (AG) = and o"a(ag) = 0-4- For more 
details see text. 

with a second distribution which is generated under the assumption that no specific home 
fitness exists. For this purpose we have defined, for each team in a given season, the random 
variable AGi — AG2. Here the first term contains the average of the goal differences of some 
17 matches and the second term the average over the remaining 17 matches. The 34 matches 
were attributed to both terms such that the number of home matches of the first term is 
9 (or 8) and that of the second team is 8 (or 9), respectively. Then we have generated the 
distribution of AGi — AG2. In order to get rid of the residual home effect (9 vs. 8) we have 
shifted this curve so that the average value is 0. This procedure has been repeated for many 
different mappings of this kind and for all seasons. The resulting curve is also shown in 
Figfl5l It reflects the statistical width of A (AG) after a season if no home advantage were 
present. It can be very well described by a Gaussian. When shifting this distribution by the 
value of the average home advantage one obtains an estimate of the distribution of A (AG) 
for er A ( AG ) = 0. To be consistent with this procedure we have generated the distribution 
of A(AG)(iV = 17) in an analogous way. We have calculated this distribution for every 
individual season and shifted each curve so that the mean agrees with the overall mean. 
In this way we have removed a possible broadening of this curve due to the year-to-year 
fluctuations of the general home advantage. 

In agreement with the discussion of Figj9] one observes a good agreement with the actual 
distribution of A (AG). By convolution of this distribution with a Gaussian with vari- 
ance cr A , AG ,x one can get information about the sensitivity of this analysis. Choosing, e.g., 
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°"a(ag) — 0.4, one can clearly see that this choice is not compatible with the actual distri- 
bution of A (AG). Thus, if at all, the residual home fitness can be described by a value of 
cta(ag) significantly smaller than 0.4. In the main text we have derived an upper limit of 
0.12. 
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